`PyArrayDescr` methods like in `np.dtype` + some missing stuff in `npyffi` #261

aldanor · 2022-01-19T16:41:49Z

The npyffi stuff has been pulled out of pyo3-type-desc.
Methods in PyArrayDescr mostly replicate np.dtype object, plus a few methods replicating C macros (this will greatly simplify writing tests in pyo3-type-desc, plus makes PyArrayDescr object somewhat useful).

aldanor · 2022-01-19T16:42:32Z

There's still a few TODOs and questions around that will need to be resolved, plus I wanted to add at least some trivial tests for each of the methods, and add a new() that replicates np.dtype() constructor.

aldanor · 2022-01-19T16:52:23Z

Another thing I don't like is how typenums are encoded right now - maybe clean it up a bit?

NPY_TYPES enum contains NPY_BOOL = 0, ... NPY_CHAR = 26, NPY_USERDEF = 256.
The NPY_TYPES enum is marked as u32, however in C code it's always a c_int
Technically, this enum does not cover all typenums, because userdef typenums may extend past the NPY_USERDEF (according to NPY_NUMUSERTYPES constant)
We could replicate PyTypeNum_IS* (e.g. is_unsigned(), is_integer()) etc on the typenum, so it would have at least some use in a public API of PyArrayDescr, but then it would need a separate type.

What could be done:

Remove NPY_USERDEF from NPY_TYPES, replace it with a const (or with a const inside the impl)
What should be the repr of NPY_TYPES? (is it c_int or c_uint?)
Add enum TypeNum { Native(NPY_TYPES), UserDef(c_int) } (or UserDef(u32)?)
Add all of the PyTypeNum_IS*() macros as methods on TypeNum, e.g. is_integer())
Add From<> conversions from int to TypeNum and vice-versa (leq than 26 => native, otherwise userdef; not exactly precise but should be good enough)

From the public API perspective, you could then do dtype.num().is_integer() which is nicer than messing with c_int and flags.

src/npyffi/flags.rs

src/npyffi/array.rs

src/npyffi/types.rs

src/dtype.rs

adamreichold · 2022-01-19T17:57:55Z

The NPY_TYPES enum is marked as u32, however in C code it's always a c_int

I don't think that it is possible to reference the type alias in a repr directive which is probably why it has this representation in the first place.

so it would have at least some use in a public API of PyArrayDescr, but then it would need a separate type.

Personally, I would actually prefer it not becoming part of the API again if possible. Put differently, I see API surface as a liability that should be avoided if possible.

aldanor · 2022-01-19T23:16:51Z

@adamreichold Fixed all todos and a few things noted above, let me know if you think anything else is needed to change/add/remove.

(I still need to write tests for all of this; although it's all quite trivial it should all get triggered at least once somewhere in the test suite.)

As for fields - I've changed them to an iterator of unwrapped items in this commit - BUT, here's another idea: what if we change it to a simple

pub fn get_field(&self, name: &str) -> PyResult<(&PyArrayDescr, usize)>

And then we don't have maps, iterators and excessive unwrapping, it all lands onto the user and we keep it reasonably simple. Here, get_ prefix indicates that some non-trivial action will be performed and in this case it's also fallible (btw, to be consistent with other names here, I'd also suggest to rename get_type() to typeobj() or something since type is a keyword, since it doesn't do anything and just returns self->typeobj).

Given that there's already fn names(&self) -> Option<Vec<&str>>, this should be more than sufficient.

davidhewitt

👍 just skim read, got a few quick thoughts. Thanks!

~~Would it be worth adding some tests for the new dtype methods?~~ edit: I see you mentioned that above!

src/npyffi/types.rs

src/npyffi/array.rs

src/dtype.rs

davidhewitt · 2022-01-21T00:24:49Z

(I still need to write tests for all of this; although it's all quite trivial it should all get triggered at least once somewhere in the test suite.)

If there's interest, you might want to add a coverage job to CI. You can steal configuration from pyo3 :)

aldanor · 2022-01-21T00:33:34Z

If there's interest, you might want to add a coverage job to CI. You can steal configuration from pyo3 :)

Looks pretty nice, good point! (never had a chance to use ci coverage with rust in particular for some reason, but from a brief look at pyo3's codecov reports looks pretty cool). I'll try to open a separate pr for that tomorrow here.

aldanor · 2022-01-22T03:30:24Z

Ok:

Added tests for all of the methods (on a scalar, subarray and a struct type)
Removed subdtype and is_unsized, they are redundant
Replaced fields with get_field
Fixed base and its return type (no more Option)
Added PyArrayDescr:::get_type() back as hidden/deprecated
(various cleanup etc)

aldanor · 2022-01-22T04:01:10Z

Ok, clippy's finally happy. Added a changelog entry as well. So unless there's further comments this should be good to go.

adamreichold

Ok, clippy's finally happy. Added a changelog entry as well. So unless there's further comments this should be good to go.

I think so too. Will leave it open until tomorrow if anybody else wants to have a look.

(It is a matter of personal preferences, but I do find the changelog entry a bit on the wordy end of the scale. This could undermine the changelog's purpose of quickly getting an overview of the changes. The argument does not seem strong enough to me to ask for changes, I just wanted to mention it for consideration.)

kngwyu

Well done, thank you!

src/dtype.rs

src/npyffi/types.rs

aldanor · 2022-01-22T14:41:10Z

(It is a matter of personal preferences, but I do find the changelog entry a bit on the wordy end of the scale. This could undermine the changelog's purpose of quickly getting an overview of the changes. The argument does not seem strong enough to me to ask for changes, I just wanted to mention it for consideration.)

@adamreichold I've noticed that as well. I guess I'm just used to more lengthy changelogs like the one we keep in the hdf5 crate. For me personally, if I'm an active user of some crate, I would be more than happy if the devs shared all the new functionality and features that may be relevant to me as a user. As long as it's written in a structured form in human language and doesn't look like Arrow changelog.

In cases like the one here, I suggest to always commit as much information as possible to the changelog, and before the release to shorten it down if needed and prettify it (e.g. "10 new methods added to PyArrayDescr to match np.dtype API, see docs for details"). It's usually easier to do it this way than the other way around (hunting through other contributors' commits in the commit history to see if there's anything worth noting in the changelog before the release).

Looking at the current rust-numpy's changelog, it may look as if there has been near zero development from 0.12 to 0.15 (all notes are brief 'fixed bug', 'version bump'). And yet there's around a hundred commits and a substantial a lot of work. In fact, I'd even go as far as update the previous entries in the changelog to expand them slightly and make the changelog more self-explanatory (e.g. instead of 'added einsum' -> 'added einsum!() macro equivalent to calling np.einsum()`, etc).

But anyways, I'm more of a newcomer here so that's my personal 2 cents on the subject :)

aldanor added 8 commits January 19, 2022 12:17

ffi: add PyArray_{malloc,realloc,free} re-exports

46d7318

ffi: add a few char-like enums

7d913d6

ffi: add two datetime-metadata-related structs

730867c

ffi: add descriptor-related flags

5a78a87

fixup pub type char enums

38c4282

Rename PyArrayDescr::get_typenum -> num

b30701c

Derive PartialOrd/Ord for NPY_TYPES

5a1a91d

Add many PyArrayDescr methods like in np.dtype

45f6d4a